Applications
13
TABLE 1.2
Experimental results of some famous binary methods on ImageNet.
Methods
Weights Activations
Model
Binarized Acc. Full-precision Acc.
Top-1 Top-5 Top-1
Top-5
XNOR-Net [199]
Binary
Binary
ResNet-18 51.2
73.2
69.3
89.2
ABC-Net [147]
Binary
Binary
ResNet-50 70.1
89.7
76.1
92.8
LBCNN [109]
Binary
–
–
62.431
–
64.94
–
Bi-Real Net [159]
Binary
Binary
ResNet-34 62.2
83.9
73.3
91.3
PCNN [77]
Binary
Binary
ResNet-18 57.3
80.0
69.3
89.2
RBCN [148]
Binary
Binary
ResNet-18 59.5
81.6
69.3
89.2
BinaryDenseNet [12]
–
–
–
62.5
83.9
–
–
BNAS [36]
–
–
–
71.3
90.3
–
–
1.2.1
Image Classification
Image classification aims to group images into different semantic classes together. Many
works regard the completion of image classification as the criterion for the success of
BNNs. Five datasets are commonly used for image classification tasks: MNIST [181], SVHN,
CIFAR-10 [122], CIFAR-100 and ImageNet [204]. Among them, ImageNet is the most diffi-
cult to train and consists of 100 classes of images. Table 1.2 shows the experimental results
of some of the most popular binary methods on ImageNet.
1.2.2
Speech Recognition
Speech recognition is a technique or capability that enables a program or system to process
human speech. We can use binary methods to complete speech recognition tasks in edge
computing devices.
Xiang et al. [252] applied binary DNNs to speech recognition tasks. Experiments on
TIMIT phone recognition and 50-hour Switchboard speech recognition show that binary
DNNs can run about four times faster than standard DNNs during inference, with roughly
10.0%.
Zheng et al. [290] and Yin et al. [273] also implement binarized CNN-based speech
recognition tasks.
1.2.3
Object Detection and Tracking
Object detection is the process of finding a target from a scene, while object tracking is the
follow-up of a target in consecutive frames in a video.
Sun et al. [218] propose a fast object detection algorithm based on BNNs. Compared to
full-precision convolution, this new method results in 62 times faster convolutional opera-
tions and 32 times memory saving in theory.
113×13 Filter